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Abstract. As an example for complex systems with extreme events we investigate 
ocean wave states exhibiting rogue waves. We present a statistical method of data 
analysis based on multi-point statistics which for the first time allows grasping extreme 
rogue wave events in a statistically highly satisfactory manner. The key to the success 
of the approach is mapping the complexity of multi-point data onto the statistics 
of hierarchically ordered height increments for different time scales for which we can 
show that a stochastic cascade process with Markov properties is governed by a Fokker- 
Planck equation. Gonditional probabilities as well as the Fokker-Planck equation itself 
can be estimated directly from the available observational data. With this stochastic 
description surrogate data sets can in turn be generated allowing to work out arbitrary 
statistical features of the complex sea state in general and extreme rogue wave events 
in particular. The results also open up new perspectives for forecasting the occurrence 
probability of extreme rogue wave events, and even for forecasting the occurrence of 
individual rogue waves based on precursory dynamics. 
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The observation and study of waves on the sea is probably one of the oldest scientihc 
and cultural endeavours of mankind. But even today the sea’s state can not be regarded 
as anything else than an enigma to man and science. Of course ocean waves have in¬ 
spired a tremendous number of often groundbreaking results in mathematics, physics 
and related sciences, including nonlinear waves, localisation, extreme events, turbulence 
and many more. But still, the fully irregular and complex state of the sea is far from 
being understood. And both the characteristics of its irregularity, as well as the rare 
but extremely large wave events occurring sometimes, now often called rogue waves, 
abscond satisfying description, even in statistical terms. 

Obviously the difficulties with understanding irregular and extreme or rogue ocean 
waves has to be seen in the context of extreme events in complex systems in general. 
Driven by various motives there has been extensive research on extreme events in a 
many fields, from the sciences, via meteorology and climate change, up to the social 
sciences and economics [D El m SI la E]. It is still a strongly debated question whether 
extreme events are generally linked to some universal stochastic mechanisms or if they 
rather originate through special features of the individual systems under study [7]. Still, 
a common point of all observations is that the empirical data are frequently punctuated 
by extreme events which seem to play an important role. Often an analysis approach is to 
approximate the observations by means of a generalized stochastic model in which some 
variables are represented in terms of stochastic components |8]. Usually the complex 
systems under study are very high dimensional and thus hnding adequate methods to 
model the stochastic components remains a challenge. 

Besides the description of extreme events in complex systems there is also the 
demand of their prediction. Despite the fact that we have irregular and complex behav¬ 
ior of rogue waves, there are increasing number of research towards defining an early 
warning system for rogue wave occurrences PEq] or establishing a prediction method 
for short term prediction of rogue events. Studies on prediction methods mainly relys 
on deterministic behavior of non-st at ionary solutions of the underlying wave equations 
[mini [13] and also deterministic nonlinear time series analysis na. 

The present work is based on the finding that complex systems can often be 
described highly successfully as stochastic processes in scale rather than time or space. 
Examples are now known for various very different helds, like turbulence |151 [16], 
economics [m [THl [19] , biology [201 EH E2] , and many more, see [23]. With scale dependent 
processes, originally introduced by Friedrich and Peinke [15], fractal and multi-fractal 
structures [23] and even more generally joint n-point statistics [IHIEI] can be reduced by 
Markov properties to particular three point statistics. In a previous study [25] we have 
already shown for ocean wave data that certain scale dependent processes may have 
Markov properties. However, in [25] the Markov properties for the pure scale process 
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could only be derived for deliberately pre-filtered data. In the present contribution we 
extend our previous investigations and base our analysis of the wave dynamics on general 
joint (TV + l)-point probability density functions (PDF) p{h(t), h(t — ti), h(t — tn))- 
Here h{t) denotes the water surface elevation measured at a given location at time t and 
Ti are different time increments. The joint PDF provides the likelihood of a sequence 
of water surface elevation heights for iV + 1 different instants of time. We show how 
a Fokker-Planck equation can be derived which describes these general joint PDFs. 
Knowing the corresponding stochastic process for the general multi-point statistics 
we can show that also the extreme events, i.e. the rogue waves, are grasped by this 
stochastic approach. The approach also allows time-series reconstruction in a statistical 
sense, and thus a statistically valid prediction of rogue wave occurrence. 

The paper is structured as follows. First the mathematical aspects of multi-point 
and multi-scale description as well as the connection to scale dependent stochastic 
processes are introduced. Then the validity of the description based on observational 
ocean wave data is demonstrated. Finally the approach is applied to reconstruct time 
series for the underlying observational data and to forecast the occurrence probability 
of rogue waves in the given sea state. 


2. Multi-point statistics 


In this section the statistical background of our approach for a multi-point reconstruction 
is presented. In the following we use the short-hand notation hi := h{ti) for the elevation 
of the water surface measured at a given location at time U, with hi+j := h{ti -\-tj). We 
dehne the relative change in surface height over a time interval or, respectively, a time 
scale Tj as 


= ^( d ) := H'ti) - Kti - d )- (1) 

The aim is to calculate the joint probability p{h*,t*] hi,t* ~ 'Ti] ■■■] — tn) of 

occurrence of the event {h*,t*}, together with the knowledge of the past points 
{hi,t* — Ti;h 2 ,t* ~ '^ 2 ', ■■■] hN,t* ~ We assume that the system has no explicit 

time dependence, i.e. the system is stationary. The probability of occurrence of the 
event {h*} under the conditions of the past points is given by 

p{h*; hi,ri;...; hAr,rjv) 


p{h*\hi,Ti ,...; Hn^tn) = 


( 2 ) 


p(/ii,ri;...;h7v,rAr) 

Next, the joint (N-l-l)-point PDF can be expressed in an equivalent way by joint 
increments statistics 


p{h*-, hi, Ti; h2, T2] ...; /iAT, tn) = p{h* - hi, rp h* - h2, T2, ...; h* - Hn, tn, h*) 

= p(6;6;-;eiv,h*) 

= p(6;6;-;eiv|h*)-p(h*). (3) 

Note instead of the knowledge of wave heights at N-l-1 points, we consider now the 
knowledge of N height increments and one selected height h*. Without loss of generality 
we take Ti < Tj+i, and thus introduce a hierarchical ordering of the increments ^i. 
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Only if the conditional PDFs do not depend on h*, i. e. if 

= p(6; 6; (4) 

the (N+l)-point statistics reduces to N-scale statistics of the increments at scales Xj. 
In our previous work iza we had applied filtering based on Hilbert-Huang transform 
techniques (HHT) to the wave data. The filtering removed the dependency on h* by 
kind of separating off the underlying dominant frequency, i.e. the wave like nature of 
the system. Still, already in this case Markov properties could be shown for the filtered 
wave data and thus the multi-scale PDF could be factorized in 


p{h*; hi, Ti ,...; Hn, Tn) = p{^i ,...; ^n) ■ p{h*) 

= P(^i|6) • ••• • p{^n-i\^n) ■ p{^n) ■ P{h*). (5) 


In our present work here we do not apply any pre-filtering and stay focussed on the very 
direct data itself. This renders the approach much more general and we directly start 
with the multi-point statistics (Eq. ([2])) to investigate if the Markov property of the 
process is given for 

P(0l0+l)0+2, •••,0+Af) h*) = P(OlO+b h*). (6) 

More specihcally we will first investigate from the observational data if 

holds. This we will take as hint for the validity of Markov property. Using Eq. ([6]), the 
multi-point PDF Eq. ([3]) can then be factorised as 

p{h*-, hi, Ti; hs, Ta;...; h^, tn) = h*)-...-p{^N-i\^N, h*)-p{^N\h*)-p{h*).{8) 


As Eq. ([6]) is nothing else than the Markov property of a stochastic process of evolving 
in the time scale Xj, the evolution of conditional PDFs of Eq. (|8]) can be expressed by 
Kramers-Moyal expansion [26] . 


oo ^ 

n=l 




(9) 


where Kramers-Moyal coefficients are defined as 


= lim 


nldr 


< 


C{Tj -6 t, h*) -^j{Tj,h*) 


> 


( 10 ) 


Note that the pre-factor —x in Eq. ([H]) indicates that we consider the process for 
decreasing x- values and an evolution in log-scale of x. 

If the Kramers-Moyal coefficient is zero, then it follows from the Pawula 
theorem that all coefficients for n > 3 are zero, too cf. izg. The Kramers-Moyal 
expansion then yields a Fokker-Planck equation with just two coefficients. 


d 




d 


dC 


d2 
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denotes the drift and the diffusion coefficient. With this the Fokker-Planck 
equation turns out a suitable description for the conditional probabilities of the water 
surface height increments, from which in turn the general multi-point joint PDF of the 
surface heights themselves, Eq. (| 8 ]), can be determined as 


p{h*\hi,Ti, ...■,hN,TN) 


— -- X p{h* - hi,Ti\h* - h2,T2]h*) X 

^ p{h* - hN-i,rN-i\h* - hN,TN]h*) 

p{hi - hN-i,TN-i - Ti\hi - hN,rN -rp hi,Ti) 
p{h* - hjv,rjv|/j,*) 
p{hi - Hn^tn - ri|hi,ri)' 


Using the increment notation, omitting the r-values and dehning ■= hi — hj with 


the corresponding time scale tj 

p{h*\hi,Ti;...;hN,rN) = 


Ti and j = 2,..., N, this equation simplihes to 

p(h') ^ n"T‘p(6k.+i;V) ^ 

p(hi) nfc'pKi, I5i+i;fci) pKivlfci) 


( 12 ) 


For a given height h* the probability of its occurrence p{h*\hi, ri; hi\f, tn) is given by 
the simple conditional PDFs, which can be calculated from the Fokker-Planck equation, 
or which can be estimated directly from the data. Note the simple conditional PDFs 
p{^i,Ti\^j,Tj-, h*) only contain information about three height values h*,hi,hj; or more 
abstractly, of three points of the time series h{t). Thus Eq. flT^ is a three point closure 
of the multi-point problem. 


3. Results based on observational data 

The wave measurements used in this study were taken in the Sea of Japan, at a location 
3 km off the Yura hshery harbor, where the water depth is about 43 meters, further 
details can be found in 1271 12 H 11211 EU]. First we want to examine if the conditional 
PDFs depend on the wave height itself by comparing both sides of the equation 

P(ei| 6 ;/^*)=P( 6 I 6 ). (13) 

In Fig. [1] the comparison of conditional PDFs from both sides of Eq. ffT3l) at scales 
Ti = 14 and T 2 = 28 seconds and for two different values of h* are shown. To get 
sufficient data we always use an interval of h* with ±(J/i/4 (where ah = \J (h^) ). For 
h* = 0 in Fig. [T](a) both distributions are almost the same but for values of hg 7 ^ 0, 
like in Fig. [T](c) a signihcant shift of the red contour plot (solid lines), which is the left 
hand side of Eq. ([T2]) . is found. As .epsa result from this one can clearly deduce that 
the conditional PDFs p(^i|^ 2 ; h*) do depend on h*. 

Next the Markov properties according to Eq. ([7]) can be checked. Note that we 
have to compare two data sets according to and and the size of 

each of these data sets is very different. The verihcation is thus performed by the use 
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Figure 1. Contour plots of the conditional PDFs _p(^i|^ 2 ) (dashed black lines) and 
piC\^ 2 ',h*) (solid red lines) with = ^(r^) for scales ti = 14 and T 2 = 2ti, for 
h* = 0 ± tT?i/4 (a) and h* = 2ah ± cr?i/4 (c). Cuts through the conditional PDFs for 
fixed values of ^2 = — 0.5cr/i in (a) and (c) are shown in (b) and (d) respectively. 


of the Wilcoxon test [I6] as this test is suitable to compare the statistical similarity 
of two sample sets of different sizes. The validity of the Wilcoxon test can be shown 
by the normalized expectation value < AQ* > of the number of inversions of the 
conditional wave height increments and 'Cil€j+i;€j+ 2 ;/**- Markov properties are 

given, < AQ* > has a value of ^ 0.8. The values of < AQ* > in Fig. |2] for 

different values of h* show that Markov properties hold for (r > 14 seconds). This 
dehnes a hnite minimum step size or scale in the Markov process of the evolution of 
the surface elevation increments .^pSuch a hnite step size is well known for stochastic 
processes in general BH, and the scale is called the Einstein-Markov length, which has 
for example also been found in a similar way for turbulent how data, cf. [32l [33]. The 
scale has been marked by a vertical red dashed line in Fig. O Note that compared to 
our previous work [25] the Markov properties are fulhlled without applying a Hilbert- 
Huang Transform (HHT) to the original data, which is due to the fact that we have now 
included the dependencies on h*. 
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Figure 2. Wilcoxon test of Eq. [7] for different values of h*. 


Based on the finding that Markov properties are fulfilled for the evolution of water 
surface height increments with decreasing time scale tj we can now proceed to 
estimate the corresponding stochastic process via the above mentioned Kramers-Moyal 
coefficients. Based on the knowledge of the conditional PDF like shown in Fig. [T]the 
conditional average in Eq. (lTn|l is known too. The estimation of hm 5 ^_j.o causes some 
problems, in particular due to the Einstein-Markov length, but has been worked out 
already in several publications [161 EH [35] • Besides this direct estimation we optimize 
the obtained functional forms of and by minimizing the differences between 
measured conditional PDFs and those obtained by numerical solutions of the resulting 
Fokker-Planck equation [36|. Fig. [3] shows the estimated drift and diffusion functions, 
T, h) and r, h), of ocean wave surface elevation data for r = 140 seconds 

and different values of wave height h*. For h* ^ 0 the curves are shifted 

in the vertical direction, whereas no significant change is found for the diffusion term. 
Furthermore the fourth order Kramers-Moyal coefficient is indeed found 

to be close to zero for different values of h*, thus the Fokker-Planck description we 
propose in Eq. fITT]) can be assumed to be valid. Note that the surface elevation height 
increments, are given in the units of their standard deviation in the limit r —)■ oo, 
(Too, which is identical to y/^ah = \j2{h‘^) [T6] . 
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Figure 3. Drift, D^^\CT,h), diffusion, and the Kramers-Moyal 

coefficient r, h) for different value of wave height, /i, at r = 140 s. 
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To ease parameterisation, the drift and diffusion terms can be approximated by 
first and second order polynomials in 

r, h) = dio{T, h) - 

=d2o(r)-d2i(T)^ + d2i(r)^^. (14) 

The height dependency of the drift function is expressed by the dio{h*, r)-coefiicient and 
our results are shown in Fig. 01 The results indicate once more that we have a strong 
wave height dependency in our process. 



t/tem 


Figure 4. coefficient dio from Eq. [14] as a function of r for different wave surface 
height values, h*. The dotted lines are the second-order polynomial fits in r. 


4. Reconstruction of time series 

The knowledge of the conditional probabilities p{h*\hi, ri,..., h^, tn) and its estimation 
by Eq. (IT^ can be used to generate a new data point h*. Shifting the procedure by 
one step and repeating the same procedure may be used to generate new surrogate 
time series. For technical reasons one should avoid zeros in conditional pdfs if one 
uses Eq. flT^ . Here we used kernel density estimation which is very helpful for 
parameter ranges for which we have only limited data [33 |38]. The initial idea for 
reconstructing time-series following this procedure was originally developed in a similar 
way for fluid turbulence data, see [39]. The time scales we use here for this process 
are Tn = n ■ tem where n = 1, 2,..., 7 and the Einstein-Markov time scale tem = 14 
seconds, as shown in Fig. |2l (The maximal value of n = 7 was chosen, as for that time 
step the autocorrelation of the height increments approaches zero.) In Fig. [5](a) typical 
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time series obtained is shown. In the hgure part (d) and (e) two selected conditional 
probabilities p{h*\hi,Ti,..., hj^,TN) are shown to illustrate our method. In addition 
to the conditional probabilities the single event probability p{h*) = p{h) of all height 
values is shown (red curve). These hgures show clearly how the conditional probabilities 
change with hi, ri,..., /iat, tat the values of the N wave heights seen before. There are 
cases when smaller h*- values are expected in the next step, see Fig. |5] (b), and there 
are cases when large h*- values become highly likely, see Fig. [5](c). 
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Figure 5. Reconstructed time series (a) after Eq. [T2l Two time windows are marked 
by (&) and (c) for which the corresponding multi- conditioned PDFs are given (d) 
and (e). To show the changing volatility the multi- conditioned PDFs (black), the 
unconditional PDFs (red) estimation from all data are shown too. Note the obvious 
changes of the likelihood of large wave amplitudes. 

To illustrate that the reconstructed time series are indeed statistically similar to the 
measured wave data we repeat the above mentioned Fokker-Planck analysis. In Fig. [6] 
we show that from the surrogate data we obtain the same drift and diffusion coefficients. 
Also the corresponding PDFs p{^i,Ti) obtained from the measured data and from the 
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numerical solution of the Fokker-Planck equation using the estimated drift and diffusion 
terms are the same as shown in Fig. [7|^a). Furthermore the statistics of the wave height 
maxima are well grasped by the reconstructed data, see Fig. d^b). Both empirical and 
reconstructed data follow a generalized gamma distribution very well, as expected from 
[25] . From this verihcation of the obtained stochastic process we conclude that both, 
the empirical data and the reconstructed data, have the same multi-point statistics. 




CJoo Ooo 


Figure 6. the drift and diffusion h) coefficients of ocean wave 

data in time scale t = 280 s. The black dots are the original data and hollow circles 
are from reconstructed time series. 

Based on the proposed reconstruction of time series it is now possible to generate 
long synthetic time series to work out further statistical features of the wave data. 
We have chosen 1000 data points of empirical data as initial condition and run it to 
produce 1.1 x 10® synthetic data with sampling rate of 1 Hz. In this reconstructed time 
series we have captured three events that we could consider as rogue waves, using the 
usual definition M. saying h* must be larger than 2 times the signihcant wave height, 
which is 2.4 m for our data. The corresponding three sections of the reconstructed time 
series are shown in Fig. [HI (b,c and d). Also, we performed 4096 different runs of 2048 
seconds blocks. From these data we captured 33 time series with extreme values and a 
corresponding waiting time of about 2.5 x 10® seconds to obtain an extreme event, or a 
rogue wave. 

Next we discuss the possibility to forecast emerging rogue waves. From the 
conditional probability. Fig. [5]^e) (Black curve) we can quantify the likelihood of the 
appearance of the measured amplitude of hr > 5.2 m by integration 

poo 

Pextreme= p(h* | hi, Ti; . . . ;/Itv, Tjv)dh* (15) 

J fX'p 

and obtain 23.6%. This likelihood Pextreme{hr > 5.2 m) can be evaluated for each time 
step and result in the changing risk of emerging rogue waves, as shown in Fig. [9l This 




Surface elevation [m] 
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Figure 7. (a) Empirical (hollow circles) and reconstructed (red filled symbols) PDFs 
for different scales. Time scales r = 14,28,42,56,70 and 84 are chosen and PDFs 
are shifted in vertical direction for clarity of presentation, (b) Distribution of wave 
height maxima for empirical and reconstructed data in normal and log (inset plot) 
scale, which follows gamma distribution. 
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Figure 8. Three different part of stochastic reconstructed time series (b,c and d) 
based on multi point PDFs, from the empirical data as initial conditions (a). 


probabilistic characterization of extreme events returns some false alarms and as well 
some true hits. 
















Capturing rogue waves by multi-point statistics 


13 


40 


CD 

E 

CD 



1.8x10^ 2.0x10^ 6.6x10^ 6.8x10^ 


Time [Sec] Time [Sec] 


Figure 9. Reconstructed time series (down) and probability of having an extreme 
event for each reconstructed point, Pextreme- (up) 

A common method to test the quality of a prediction is the receiver operating 
characteristic curve (ROC) [HI |42l |43]. The idea of ROC consists of comparing the 
rate of true predicted events with the rate of false alarm. The most quantitative index 
describing a ROC curve is the area under it which is known as accuracy. In Fig. [TOl we 
have plotted ROC curves for our prediction, Pextreme^ first by considering hr = 5.2 m 
to detect the extreme event alarms. The corresponding ROC curve is plotted in black 
(solid) line. To investigate the robustness of our reconstruction method, we considered 
lower amplitude wave height for hr = 2.5 m and hr = 3.5 m and the corresponding 
ROC curves are shown in Fig. [10] in red (dotted) and blue (dash) lines, respectively. In 
all three cases we have accuracy of ROC curves bigger than 80% which indicate that 
our multi-point procedure is a proper method for time series reconstruction and can be 
used for short time prediction purposes. 
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Figure 10. ROC curve for three different estimation of Pexterme, for hr = 5.2 m (solid 
black line), hr = 3.5 m (dash blue line) and hr = 2.5 m (dotted red line) 


5. Conclusions 

We have presented a new approach for a comprehensive analysis of the complexity of 
ocean wave dynamics. The complexity of multi-point statistics can be simplihed by a 
three point closure, based on which an arbitrary N-Point statistics can be expressed 
by a hierarchy of nested three point statistics ordered in a cascade like structure. We 
have been able to show for the hrst time that by our stochastic approach not only the 
joint N-point statistics can be grasped but also extreme events, rogue waves, can be 
captured statistically. We have also shown how for each instant in time the conditional 
probability of the next wave height can be determined. As the height prohle of waves 
changes from moment to moment, also the probability of the next value of the wave 
height is changing dynamically. These changes may thus clearly give rise to measures 
indicating the risk of the appearance of rogue waves ahead of their actual emergence. 
Most interestingly this was possible although in the measured data only one event of 
rogue wave was recorded. From our analysis of the occurrence probabilities it becomes 
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clear that the rogue wave for this wave conditions is integral part of the entire complex 
stochastic. In our opinion this can only be achieved as we were able to crave out an 
N-point approach for this complex system. 
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